Introduction

Housing Valuation is an area in which statistical models can play a role. The models which are frequently used can also be used to model other price structures. The project is concerned with finding the most reliable determinants on property prices. The dataset is a subset of anonymised mortgage records for the area that is known as Greater London. The purchase price (which is different from the asking price) is available, as a a series of the characteristics of the property. The goal is to find the best group of predictors of property price and to find the most reliable determinants on property prices.

Describing the methods for property price prediction

Data Preparation

Variables in Original Dataset

## Observations: 12,536
## Variables: 31
## $ X        <int> 53, 73, 78, 95, 125, 153, 182, 189, 203, 207, 215, 21...
## $ Easting  <int> 545500, 525000, 531100, 538500, 534000, 528700, 53490...
## $ Northing <int> 173000, 177800, 183400, 169400, 168400, 168800, 18700...
## $ Purprice <int> 85000, 71000, 60000, 64000, 260000, 48500, 34500, 559...
## $ BldIntWr <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ BldPostW <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ Bld60s   <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0,...
## $ Bld70s   <int> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Bld80s   <int> 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0,...
## $ TypDetch <int> 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,...
## $ TypSemiD <int> 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0,...
## $ TypFlat  <int> 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0,...
## $ GarSingl <int> 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1,...
## $ GarDoubl <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Tenfree  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,...
## $ CenHeat  <int> 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1,...
## $ BathTwo  <int> 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ BedTwo   <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ BedThree <int> 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1,...
## $ BedFour  <int> 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,...
## $ BedFive  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ NewPropD <int> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,...
## $ FlorArea <dbl> 76.16146, 98.45262, 124.73761, 127.00000, 190.40366, ...
## $ NoCarHh  <dbl> 50.2793, 14.6342, 36.4162, 17.8082, 7.4074, 14.0187, ...
## $ CarspP   <dbl> 25.2451, 46.8865, 37.8049, 47.5936, 67.6966, 49.5512,...
## $ ProfPct  <dbl> 0.0000, 6.2500, 0.0000, 0.0000, 9.0909, 16.6667, 0.00...
## $ UnskPct  <dbl> 11.1111, 0.0000, 11.1111, 0.0000, 0.0000, 8.3333, 0.0...
## $ RetiPct  <dbl> 88.8889, 12.5000, 77.7778, 75.0000, 36.3636, 50.0000,...
## $ Saleunem <dbl> 19.2308, 5.3571, 5.2632, 8.8235, 3.6765, 3.0769, 9.74...
## $ Unemploy <dbl> 85.53494, 32.82623, 31.61733, 0.12889, 21.88766, 31.6...
## $ PopnDnsy <dbl> 11.48515, 8.29268, 7.81671, 18.18182, 8.22222, 3.7523...

Data Cleaning

Convert dummies to factors - more convenient for modelling.

For building model to predict the price of property in London, some variables shoud be organized properly.

  • Age: these represent the time period in which the property was constructed. It is from variables BldIntWr,BldPostW,Bld60s,Bld70s and Bld80s. The values of it are PreWW1, BldIntWr, BldPostW, Bld60s, Bld70s and Bld80s.

  • Type: these represent the type of building. It is from variables TypDetch,TypSemiD and TypFlat. The values of it are TypDetch, TypSemiD, TypFlat and Bungalow.

  • Garage: these represent the numbers of garage that the property has. It is from variables GarSingl and GarDoubl. The values of it are HardStnd, GarSingl and GarDoubl.

  • Bedrooms: these represent the numbers of bedrooms that the property has. It is from variables BedTwo,BedThree,BedFour and BedFive. The values of it are BedOne, BedTwo, BedThree, BedFour and BedFive.

Variables in New Dataset.

## Observations: 12,536
## Variables: 20
## $ Easting  <int> 545500, 525000, 531100, 538500, 534000, 528700, 53490...
## $ Northing <int> 173000, 177800, 183400, 169400, 168400, 168800, 18700...
## $ Purprice <int> 85000, 71000, 60000, 64000, 260000, 48500, 34500, 559...
## $ Tenfree  <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,...
## $ CenHeat  <fct> 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1,...
## $ BathTwo  <fct> 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ NewPropD <fct> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,...
## $ FlorArea <dbl> 76.16146, 98.45262, 124.73761, 127.00000, 190.40366, ...
## $ ProfPct  <dbl> 0.0000, 6.2500, 0.0000, 0.0000, 9.0909, 16.6667, 0.00...
## $ Age      <fct> Bld60s, Bld80s, PreWW1, Bld80s, Bld80s, PreWW1, Bld70...
## $ Type     <fct> TypDetch, TypDetch, TypSemiD, TypDetch, TypDetch, Typ...
## $ Garage   <fct> GarSingl, GarSingl, HardStnd, GarSingl, GarDoubl, Har...
## $ Bedrooms <fct> BedThree, BedThree, BedFour, BedThree, BedFour, BedTh...
## $ NoCarHh  <dbl> 50.2793, 14.6342, 36.4162, 17.8082, 7.4074, 14.0187, ...
## $ CarspP   <dbl> 25.2451, 46.8865, 37.8049, 47.5936, 67.6966, 49.5512,...
## $ UnskPct  <dbl> 11.1111, 0.0000, 11.1111, 0.0000, 0.0000, 8.3333, 0.0...
## $ RetiPct  <dbl> 88.8889, 12.5000, 77.7778, 75.0000, 36.3636, 50.0000,...
## $ Saleunem <dbl> 19.2308, 5.3571, 5.2632, 8.8235, 3.6765, 3.0769, 9.74...
## $ Unemploy <dbl> 85.53494, 32.82623, 31.61733, 0.12889, 21.88766, 31.6...
## $ PopnDnsy <dbl> 11.48515, 8.29268, 7.81671, 18.18182, 8.22222, 3.7523...

Data Exploration

Exploration of Dependent Variable

The Purprice variable.

The price of most property is under 600,000, but there is a outlier, which is much bigger than others. It would influence the result for the reslut of analysis.

Delete the outlier which is over 600,000.

Exploration of Independent Variable (Continous Variables)

Plot the Purprice versus FlorArea.

The floor Area and price shows a somewhat linear relationship. The slope is constant and no clear curvature is present. The price increases as floor area increases.

Plot the Purprice versus ProfPct.

Since Profpct only takes on a few values, a linear relationship is inadequate.

Plot the Purprice versus NoCarHh.

Plot the Purprice versus CarspP.

Plot the Purprice versus UnskPct.

Plot the Purprice versus RetiPct.

Plot the Purprice versus Saleunem.

Plot the Purprice versus Unemploy.

Plot the Purprice versus PopnDnsy.

Based on the plots above, all these variables do not have string linear relationship whith dependent variable property price. For cars per person in neighborhood and proportion of households with unskilled head, the fit line almost horizontal. It means They have no liner relationship with property price. Others variables are scattered around the orgin, and most of the points are scattered tightly around the orgin. The trand of lines are mainly infuenced by outliers. The same conclusion we can get that the relationship between them and price of property are very weak.

Exploration of Independent Variable (Categorical Variables)

Plot the Purprice versus CenHeat.

It show that houses with central heating are higher priced than houses without central heating. Although the average price of houses with central heating is higher, it does not differ by a large price difference. It is more comfortable when heating is provided 24/7 as to heating which needs to be set up before using which could cause discomfort in some cases.

Plot the Purprice versus Garage.

From the number of garages, we can clearly see that the houses with two garage’s median price is a lot higher than houses with single garage. Again, the size of the house is influenced by how many cars the garage can park. By assumption, one wouldn’t have two garages with a single room. It would only be available to houses with more than two rooms to have two garages.

Plot the Purprice versus BathTwo.

Furthermore, we can see that houses with two bathrooms is also higher priced on average. This difference between one bathroom to two bathrooms is much higher. Intuitively, this would be more convenient and houses with more than one washroom are typically bigger in size based on the design of the interior.

Plot the Purprice versus Bedrooms.

Finally moving on to the number of bedrooms a house would have. We can see that the houses with one room and two room does not differ by much. Even three rooms doesn’t have too much difference in the median of pricing. However, as the bedroom goes to four or even five, the jump is significantly higher.

Plot the Purprice versus Age.

Moving on the to the next predictor, we have the age of the house. From our plots, we can see that housing before the World War 1 has greatest span of pricing. It is usually because the location of the housing was excellent since it was just the beginning. Therefore, it could be one of the reasons to explain the span of prices.

The type of the house also influences the pricing of housing. For example, we had detached homes, semidetached and flats. Obviously detached homes would have the highest pricing, as it has more privacy and the layout of the houses are better. Then we have the semidetached, which is still good. However, it does lack the same amount of privacy from a full detached house. Flats would be at the end of the list since there is little privacy if the isolation was not done well.

In the plots above, we can see that the types of property is a important factor that influence the price of a property. The property with central heating tend to be more expensive. As the number of garages, bathrooms and bedrooms goes up, the price of property shows a increase trend. However, the age of proerty seems have no influence for the price of property.Large houses clearly costs more, however as the size of the houses goes up, there are few data available. As we can see from our PurPrice vs FloorArea plot, the left size is tightly scattered with data and the right side of the line has a lot fewer data.

Fit Linear Mmodels

With all the predictors examined, we move to our simple linear regression model. We first use lm() function in R for our models.

model.9v <-lm(Purprice~FlorArea+Bedrooms+Type+BathTwo+Garage+Tenfree+CenHeat+Age+ProfPct+NewPropD+NoCarHh+CarspP+UnskPct+RetiPct+Saleunem+Unemploy+PopnDnsy,data=MyData)

If we were to write out the function, it would be :

Purprice = b0 + b1FlorArea + b2Bedrroms + b3Type +…+ b17PopnDnsy

Our predictors would be able to predict the price of a house based on given London data. It would be able to predict the price based on the coefficients of the predictors. It is only required to have the right input in order to predict the price. Then we want to find the predictor that has the most impact on price. So, we used AIC to compare the different predictors. Then fit model with all predictors and choose significant predictors for linear model. Finally, fit model with significan predictors and check VIF of predictors to avoid colinearity.

Fit for A Single Variable and Look at AICs

In order to choose significant variables for model, we build model for response and every predictor respectively and output the AICs of models in the table above. We can see that the area of floor is the most important predictor for predicting the price of properties. The number of bedrooms, bathrooms and the property type are also impact the property price greatly.

Fit Linear Model With All Predictors

## 
## Call:
## lm(formula = Purprice ~ ., data = MyData[, 3:20])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -136420  -13483   -1322   10340  371624 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       12216.791   3234.459   3.777 0.000159 ***
## Tenfree1           6215.453   1352.354   4.596 4.35e-06 ***
## CenHeat1          11856.974    754.564  15.714  < 2e-16 ***
## BathTwo1          24029.638   1202.984  19.975  < 2e-16 ***
## NewPropD1          1879.988   1545.371   1.217 0.223807    
## FlorArea            677.818     11.379  59.570  < 2e-16 ***
## ProfPct              45.261     24.909   1.817 0.069235 .  
## AgeBldIntWr        3996.542    657.059   6.082 1.22e-09 ***
## AgeBldPostW       -1134.041    975.441  -1.163 0.245017    
## AgeBld60s         -7318.654   1089.956  -6.715 1.97e-11 ***
## AgeBld70s         -6713.012   1164.455  -5.765 8.36e-09 ***
## AgeBld80s           357.120   1026.580   0.348 0.727941    
## TypeTypSemiD     -12406.759   1005.914 -12.334  < 2e-16 ***
## TypeTypFlat      -17328.944   1032.138 -16.789  < 2e-16 ***
## TypeBungalow      -5754.224   1658.808  -3.469 0.000524 ***
## GarageGarSingl     3773.963    614.680   6.140 8.52e-10 ***
## GarageGarDoubl     9279.791   1676.432   5.535 3.17e-08 ***
## BedroomsBedTwo    -3399.740    869.034  -3.912 9.20e-05 ***
## BedroomsBedThree  -7863.395   1068.092  -7.362 1.92e-13 ***
## BedroomsBedFour   -1709.352   1542.268  -1.108 0.267738    
## BedroomsBedFive    3973.657   2504.121   1.587 0.112573    
## NoCarHh             -12.783     30.913  -0.414 0.679247    
## CarspP              -19.105     40.555  -0.471 0.637592    
## UnskPct             -40.730     36.679  -1.110 0.266830    
## RetiPct              -7.091      5.618  -1.262 0.206928    
## Saleunem             54.662     60.992   0.896 0.370154    
## Unemploy              9.737      5.808   1.677 0.093629 .  
## PopnDnsy             42.660     40.423   1.055 0.291287    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 27150 on 12507 degrees of freedom
## Multiple R-squared:  0.5652, Adjusted R-squared:  0.5643 
## F-statistic: 602.2 on 27 and 12507 DF,  p-value: < 2.2e-16

Then fit linear model with all predictors. The output of model shows the proportion of households without a car, cars per person in neighborhood, proportion of households with professional head, proportion of households with unskilled head, proportion of residents retired,unemployed workers,the new properties and local population density are not significant. This conclusion the the same as what we get in the corrolation coefficient table. So these variables are moved out from model.

Fit Linear Model With Significant Predictors and Check VIF

## 
## Call:
## lm(formula = Purprice ~ Tenfree + CenHeat + BathTwo + FlorArea + 
##     Age + Type + Garage + Bedrooms, data = MyData)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -135607  -13414   -1328   10382  371167 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       12225.50    2113.53   5.784 7.45e-09 ***
## Tenfree1           6140.30    1352.02   4.542 5.64e-06 ***
## CenHeat1          11854.63     754.65  15.709  < 2e-16 ***
## BathTwo1          24053.12    1202.55  20.002  < 2e-16 ***
## FlorArea            678.08      11.37  59.613  < 2e-16 ***
## AgeBldIntWr        4051.07     656.87   6.167 7.16e-10 ***
## AgeBldPostW       -1136.03     975.15  -1.165 0.244050    
## AgeBld60s         -7336.41    1089.76  -6.732 1.75e-11 ***
## AgeBld70s         -6707.95    1164.43  -5.761 8.57e-09 ***
## AgeBld80s           935.35     898.71   1.041 0.298003    
## TypeTypSemiD     -12381.18    1005.73 -12.311  < 2e-16 ***
## TypeTypFlat      -17269.10    1031.65 -16.739  < 2e-16 ***
## TypeBungalow      -5695.79    1658.13  -3.435 0.000594 ***
## GarageGarSingl     3776.80     614.47   6.146 8.16e-10 ***
## GarageGarDoubl     9257.72    1676.29   5.523 3.40e-08 ***
## BedroomsBedTwo    -3450.07     869.10  -3.970 7.24e-05 ***
## BedroomsBedThree  -7869.64    1068.21  -7.367 1.85e-13 ***
## BedroomsBedFour   -1743.82    1541.90  -1.131 0.258095    
## BedroomsBedFive    3937.18    2504.29   1.572 0.115935    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 27150 on 12516 degrees of freedom
## Multiple R-squared:  0.5647, Adjusted R-squared:  0.564 
## F-statistic: 901.9 on 18 and 12516 DF,  p-value: < 2.2e-16
##  Tenfree  CenHeat  BathTwo FlorArea      Age     Type   Garage Bedrooms 
##    6.722    1.030    1.253    3.032    1.561    9.769    1.480    4.112

Buiding model with all significant predictors and check colinearity by VIF. In the table above, the colinearity of property type is very high(9.769). It shoud be moved out form model. In the next step, the dataset would be seperate into training and testing data and the linear model would be built using training dataset and be tested using testing dataset.

Fit Linear Mmodel and Test Accuracy

## 
## Call:
## lm(formula = Purprice ~ FlorArea + Bedrooms + BathTwo + Garage + 
##     Tenfree + CenHeat + Age, data = trainData)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -131677  -13606   -1649   10507  364562 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        2386.22    1529.39   1.560 0.118744    
## FlorArea            715.26      14.97  47.766  < 2e-16 ***
## BedroomsBedTwo    -5323.44    1148.02  -4.637 3.59e-06 ***
## BedroomsBedThree -10485.73    1398.61  -7.497 7.26e-14 ***
## BedroomsBedFour   -2291.56    2049.23  -1.118 0.263493    
## BedroomsBedFive    5954.67    3305.12   1.802 0.071640 .  
## BathTwo1          24198.54    1585.61  15.261  < 2e-16 ***
## GarageGarSingl     6137.83     793.09   7.739 1.13e-14 ***
## GarageGarDoubl    15060.80    2180.87   6.906 5.40e-12 ***
## Tenfree1          -1618.72     939.73  -1.723 0.085013 .  
## CenHeat1          12268.51    1000.50  12.262  < 2e-16 ***
## AgeBldIntWr        5663.90     858.19   6.600 4.40e-11 ***
## AgeBldPostW        2312.66    1279.79   1.807 0.070791 .  
## AgeBld60s         -5397.07    1436.10  -3.758 0.000172 ***
## AgeBld70s         -5712.37    1522.99  -3.751 0.000178 ***
## AgeBld80s          3592.95    1174.30   3.060 0.002224 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 27920 on 7505 degrees of freedom
## Multiple R-squared:  0.5553, Adjusted R-squared:  0.5544 
## F-statistic: 624.8 on 15 and 7505 DF,  p-value: < 2.2e-16
## [1] 777827779
## [1] 723418875

As the output of the model above, the mean square error of testing dataset is 777827779 which is slightly lower than that of training dataset. For predictor floor area, 1 square metre increase, the average price of property would increase 715.26 GBP, keeping other predictors constant. The average price for those properties with central hesting is higer than those without central heating by 12268.51 GBP, keeping other predictors constant.

Spatial Variation

Fit model with variable Easting and Westing

## 
## Call:
## lm(formula = Purprice ~ x + y + I(x^2) + I(y^2) + I(x * y), data = MyData)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -73924 -24782  -9828   9862 444261 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.153e+06  8.741e+05  -3.607 0.000311 ***
## x            1.225e+04  2.793e+03   4.387 1.16e-05 ***
## y            3.525e+02  2.916e+03   0.121 0.903766    
## I(x^2)      -1.074e+01  2.555e+00  -4.203 2.66e-05 ***
## I(y^2)       7.372e+00  4.717e+00   1.563 0.118080    
## I(x * y)    -5.727e+00  4.323e+00  -1.325 0.185350    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 41050 on 12529 degrees of freedom
## Multiple R-squared:  0.004172,   Adjusted R-squared:  0.003774 
## F-statistic:  10.5 on 5 and 12529 DF,  p-value: 4.507e-10

Fit model with variable Easting and Westing to test is the location influencing the price of properties significantly. The result shows that the properties tend to have a lower price as we move east and the infulence is significant. So it is necessary to consider the geographic effect in predicting the price of propertities.

Load Borough Data

## OGR data source with driver: ESRI Shapefile 
## Source: "E:\AcademicYear\semester2\3_CaseStudies\project\git\GY683", layer: "LondonBoroughs"
## with 33 features
## It has 15 fields
## Integer64 fields read as strings:  NUMBER NUMBER0 POLYGON_ID UNIT_ID

Property Price Versus Borough

In the plot above, We can see the median of property price is diffrent in different borough in London. Expecially in the city of London, property price is significantly higher than that in other boroughs.

Standardsed Residuals Versus Borough

In the borough versus standard resudial plot, we can get the same conclusion that the distributioan of residuals in different boroughs are different. If we can fit model cansidering the effect from boroughs, the result might be better.we will now run a geographically weighted regression model to see how the coefficients of the model might vary across London.

First we will calibrate the bandwidth of the kernel that will be used to capture the points for each regression (this may take a little while) and then run the model:

Geographically Weighted Regression (GWR)

##    ***********************************************************************
##    *                       Package   GWmodel                             *
##    ***********************************************************************
##    Program starts at: 2020-05-10 00:06:28 
##    Call:
##    gwr.basic(formula = Purprice ~ FlorArea + Bedrooms + BathTwo + 
##     Garage + Tenfree + CenHeat + Age, data = map, bw = bw, kernel = "gaussian")
## 
##    Dependent (y) variable:  Purprice
##    Independent variables:  FlorArea Bedrooms BathTwo Garage Tenfree CenHeat Age
##    Number of data points: 7521
##    ***********************************************************************
##    *                    Results of Global Regression                     *
##    ***********************************************************************
## 
##    Call:
##     lm(formula = formula, data = data)
## 
##    Residuals:
##     Min      1Q  Median      3Q     Max 
## -131677  -13606   -1649   10507  364562 
## 
##    Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
##    (Intercept)        2386.22    1529.39   1.560 0.118744    
##    FlorArea            715.26      14.97  47.766  < 2e-16 ***
##    BedroomsBedTwo    -5323.44    1148.02  -4.637 3.59e-06 ***
##    BedroomsBedThree -10485.73    1398.61  -7.497 7.26e-14 ***
##    BedroomsBedFour   -2291.56    2049.23  -1.118 0.263493    
##    BedroomsBedFive    5954.67    3305.12   1.802 0.071640 .  
##    BathTwo1          24198.54    1585.61  15.261  < 2e-16 ***
##    GarageGarSingl     6137.83     793.09   7.739 1.13e-14 ***
##    GarageGarDoubl    15060.80    2180.87   6.906 5.40e-12 ***
##    Tenfree1          -1618.72     939.73  -1.723 0.085013 .  
##    CenHeat1          12268.51    1000.50  12.262  < 2e-16 ***
##    AgeBldIntWr        5663.90     858.19   6.600 4.40e-11 ***
##    AgeBldPostW        2312.66    1279.79   1.807 0.070791 .  
##    AgeBld60s         -5397.07    1436.10  -3.758 0.000172 ***
##    AgeBld70s         -5712.37    1522.99  -3.751 0.000178 ***
##    AgeBld80s          3592.95    1174.30   3.060 0.002224 ** 
## 
##    ---Significance stars
##    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
##    Residual standard error: 27920 on 7505 degrees of freedom
##    Multiple R-squared: 0.5553
##    Adjusted R-squared: 0.5544 
##    F-statistic: 624.8 on 15 and 7505 DF,  p-value: < 2.2e-16 
##    ***Extra Diagnostic information
##    Residual sum of squares: 5.850043e+12
##    Sigma(hat): 27893.27
##    AIC:  175347.7
##    AICc:  175347.8
##    ***********************************************************************
##    *          Results of Geographically Weighted Regression              *
##    ***********************************************************************
## 
##    *********************Model calibration information*********************
##    Kernel function: gaussian 
##    Fixed bandwidth: 6103.785 
##    Regression points: the same locations as observations are used.
##    Distance metric: Euclidean distance metric is used.
## 
##    ****************Summary of GWR coefficient estimates:******************
##                           Min.    1st Qu.     Median    3rd Qu.     Max.
##    Intercept        -11554.796    254.395   4904.294   7410.595 11034.18
##    FlorArea            612.043    676.780    707.772    732.984   821.42
##    BedroomsBedTwo   -12787.458  -6204.371  -4945.664  -3734.715 -1174.21
##    BedroomsBedThree -20027.410 -12955.621 -10648.990  -7608.165 -3183.76
##    BedroomsBedFour  -16583.906  -7625.159  -3919.857   3315.564  9841.25
##    BedroomsBedFive  -27505.494 -11882.768   8012.577  16527.424 80111.69
##    BathTwo1           8647.801  21953.435  25329.002  29390.062 39318.85
##    GarageGarSingl     2231.026   4538.113   5867.882   8153.808 11417.84
##    GarageGarDoubl     5302.221  12128.844  15679.468  18372.378 28602.52
##    Tenfree1          -7362.917  -3018.066  -1110.351    655.130  4225.97
##    CenHeat1           6789.054  10087.689  12008.319  13652.972 17671.62
##    AgeBldIntWr          66.841   1648.497   3842.940   8534.553 13274.22
##    AgeBldPostW       -4929.118  -2849.870   -355.913   5982.302 14275.50
##    AgeBld60s        -13142.216  -8457.267  -5520.874  -2885.553  1271.68
##    AgeBld70s        -11356.027  -9566.743  -7285.795  -4238.382  3812.07
##    AgeBld80s         -2849.667   -290.098   2618.808   5548.411 13068.94
##    ************************Diagnostic information*************************
##    Number of data points: 7521 
##    Effective number of parameters (2trace(S) - trace(S'S)): 182.7964 
##    Effective degrees of freedom (n-2trace(S) + trace(S'S)): 7338.204 
##    AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 175082.7 
##    AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 174949 
##    Residual sum of squares: 5.479639e+12 
##    R-square value:  0.5834657 
##    Adjusted R-square value:  0.5730883 
## 
##    ***********************************************************************
##    Program stops at: 2020-05-10 00:07:38

The output from the GWR model reveals how the coefficients vary across the 33 boroughs in London. You will see how the global coefficients are exactly the same as the coefficients in the earlier linear model. In this particular model, if we take area of floor , we can see that the coefficients range from a minimum value of 612.043 GBP(1 square metre change in area of floor resulting in a increase in average price of property of 612.043 GBP) to 821.42 GBP(1 square metre change in area of floor resulting in an increase in average price of property of 821.42 GBP). For half of the boroughs in the dataset, as floor area rises by 1 point, price of property will increase between 676.780 GBP and 732.984 GBP(the inter-quartile range between the 1st Qu and the 3rd Qu).

Coefficient ranges can also be seen for the other variables and they suggest some interesting spatial patterning. To explore this we can plot the GWR coefficients for different variables. Firstly we can attach the coefficients to our original dataframe - this can be achieved simply as the coefficients for each ward appear in the same order in our spatial points dataframe as they do in the original dataframe.

Taking the first plot, which is for the area of floor coefficients. We can see that in the boroughs north of the city center, there is the highest change of property price corresponding to 1 square metre increase. However, in the boroughs south of the city center, the lowest change of property price corresponding to 1 square metre increase. This is a very interesting pattern, but may partly be explained the in the boroughs north of the city center, the buyers value the area of floor much, which makes the area of floor influencing the price of property much.

The second plot is for central heating. In the west and east part of London, hasing a central heating can only influence by less than 10,000 GBP. For those boroughs in the north and south of city center, the propertity whit a center heating is much more important, the price can increase by 12,500 to 17,500 conpared with those without central heating.

For other predictors in the model, the similar effect can also be see. They have the defferent coefficients in diffreren boroughs.

Conclusion

In conclusion, the most reliable determinants on property prices are area of floor,the number of bedrooms, having more than two bothrooms, the number of garage, with central heating, Leasehold/Freehold indicator and the age of properties. Although,the global model with these predictors can get a good result for predicting the price of properties, it dose not consider the spatial component. It is proved that GWR is a better way to estimate the price of property.